Accelerating Pipelined Integer and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques

نویسندگان

  • Zhen Luo
  • Margaret Martonosi
چکیده

ÐThe speed of arithmetic calculations in configurable hardware is limited by carry propagation, even with the dedicated hardware found in recent FPGAs. This paper proposes and evaluates an approach called delayed addition that reduces the carrypropagation bottleneck and improves the performance of arithmetic calculations. Our approach employs the idea used in Wallace trees to store the results in an intermediate form and delay addition until the end of a repeated calculation such as accumulation or dotproduct; this effectively removes carry propagation overhead from the calculation's critical path. We present both integer and floatingpoint designs that use our technique. Our pipelined integer multiply-accumulate (MAC) design is based on a fairly traditional multiplier design, but with delayed addition as well. This design achieves a 72MHz clock rate on an XC4036xla-9 FPGA and 170MHz clock rate on an XV300epq240-8 FPGA. Next, we present a 32-bit floating-point accumulator based on delayed addition. Here, delayed addition requires a novel alignment technique that decouples the incoming operands from the accumulated result. A conservative version of this design achieves a 40 MHz clock rate on an XC4036xla-9 FPGA and 97MHz clock rate on an XV100epq240-8 FPGA. We also present a 32-bit floating-point accumulator design with compiler-managed overflow avoidance that achieves a 80MHz clock rate on an XC4036xla-9 FPGA and 150MHz clock rate on an XCV100epq240-8 FPGA. Index TermsÐDelayed addition, accumulation, multiply-accumulate, MAC, FPGA.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Delayed Addition Techniques to Accelerate Integer and Floating- Point Calculations in Configurable Hardware

This paper proposes and evaluates an approach for improving the performance of arithmetic calculations via delayed addition. Our approach employs the idea used in Wallace trees to delay addition until the end of a repeated calculation such as accumulation or dot-product; this effectively removes carry propagation overhead from the calculation’s critical path. We present integer and floating-poi...

متن کامل

A Configurable Floating-Point Discrete Hilbert Transform Processor for Accelerating the Calculation of Filter in Katsevich Formula

Katsevich formula is currently a hot topic for cone-beam computed tomography (CBCT). The filter in the formula can be computed by a regular discrete Hilbert transform (DHT). A configurable single precision floating-point (SPFP) DHT processor is proposed for accelerating the calculation of filter in Katsevich formula. The configurable processor is of memory based architecture with one pipelined ...

متن کامل

A Pipelined, Single Precision Floating-Point Logarithm Computation Unit in Hardware

A large number of scientific applications rely on the computing of logarithm. Thus, accelerating the speed of computing logarithms is significant and necessary. To this end, we present the realization of a pipelined Logarithm Computation Unit (LCU) in hardware that uses lookup table and interpolation techniques. The presented LCU supports single precision arithmetic with fixed accuracy and spee...

متن کامل

Rapid Implementation of Floating-point Computations Using Phase-Coherent Dynamically Configurable Pipelines

The Phase-Coherent Dynamically Configurable Pipeline is a concept for the rapid implementation of pipelined computational algorithms in configurable hardware. The approach allows a high level of sharing of floating-point resources among multiple computations. The concept features a simple tag-based control scheme and a sparse-pipeline allocation approach that enables all the stages of an arithm...

متن کامل

FPGA Optimizations for a Pipelined Floating-Point Exponential Unit

The large number of available DSP slices on new-generation FPGAs allows for efficient mapping and acceleration of floating-point intensive codes. Numerous scientific codes heavily rely on executing the exponential function. To this end, we present the design and implementation of a pipelined CORDIC/TD-based (COrdinate Rotation DIgital Computer/Table Driven) Exponential Approximation Unit (EAU) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Computers

دوره 49  شماره 

صفحات  -

تاریخ انتشار 2000